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(54) Very long instruction word processor 

(57) A parallel processor that performs efficient par- 
allel processing is provided. The parallel processor, 
which performs parallel processing of one or more basic 
instructions contained in each of instruction words 
delimited by instruction delimiting information, includes: 
a plurality of instruction execution units that perform 
processes corresponding to supplied basic instructions 
in parallel; an instruction fetch unit that fetches the 



instruction words one by one in accordance with the 
instruction delimiting information; and an instruction 
issue unit that issues each of the basic instructions con- 
tained in each of the instruction words fetched by the 
instruction fetch unit to a corresponding one of the 
instruction execution units. 
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Description 

BACKGROUND OF THE INVENTION 

1 . Field of the Invention 

[0001] The present invention generally relates to 
processors, and, more particularly, to a parallel proces- 
sor that executes a plurality of baste instructions in par- 
allel. 

2. Description of the Related Art 

[0002] Generally, in a conventional computer sys- 
tem, a plurality of basic instructions are executed in par- 
allel by pipeline processing, thereby improving its 
performance. Conventionally, a plurality of basic instruc- 
tions constitute a fixed-length instruction word, and a 
very-long instruction word (VLIW) technique is 
employed as a method for executing a plurality of basic 
instructions contained in one instruction word in parallel. 
Also, a super scalar technique may be employed. In 
accordance with the super scalar technique, basic 
instructions are executed in parallel depending on the 
number of basic instructions contained in each instruc- 
tion word. 

[0003] FIG. 1 shows the structure of a conventional 
parallel processor 10. This parallel processor 10 com- 
prises an instruction fetch unit 1 connected to a memory 
7, an instruction issue unit 3 connected to the instruc- 
tion fetch unit 1, instruction execution units EU0 to EUn 
each connected to the instruction issue unit 3, and a 
register unit 5 connected to each of the instruction exe- 
cution units EU0 to EUn. 

[0004] The instruction fetch unit 1 fetches an 
instruction word from the memory 7, and supplies the 
instruction word to the instruction issue unit 3. The 
instruction issue unit 3 issues the basic instructions 
contained in the supplied instruction word to the instruc- 
tion execution units EUOto EUn. If the instruction execu- 
tion units EUO to EUn are still executing previous basic 
instructions at this point, the instruction issue unit 3 
waits for the end of the execution. When the execution 
ends, the instruction issue unit 3 supplies the basic 
instructions to the instruction execution units EUO to 
EUn. 

[0005] The instruction execution units EUO to EUn 
execute the basic instructions, and notify the instruction 
issue unit 3 of the end of the execution. The register unit 
5 supplies data to the instruction execution units EUO to 
EUn, if necessary, and holds the execution results of the 
instruction execution units EUO to EUn. The externally 
connected memory 7 stores a instruction word string to 
be executed in the parallel processor 10. The memory 7 
also stores necessary data for the execution units EUO 
to EUn to execute instructions, and data as the execu- 
tion results. 

[0006] FIG. 2 shows the formats of instruction 



words to be supplied to a parallel processor having four 
instruction execution units EUO to EU3. As shown in 
FIG. 2, each instruction word is made up of a basic 
instruction El and a do-nothing instruction NOR If the 

5 number of basic instructions contained in one instruc- 
tion word to be executed in parallel is smaller than the 
number of the instruction execution units EUO to EU3, 
the proportion of do-noting instructions is large. 
[0007] In the conventional parallel processing 

w method of executing a plurality of basic instructions by 
the VLIW technique, each instruction word has a fixed 
length. Therefore, if the number of basic instructions to 
be executed in parallel is smaller than a predetermined 
number, do-nothing instructions are added to comply 

15 with the predetermined length. Because of that, in a 
program having a small number of basic instructions in 
total, the proportion of do-nothing instructions is large, 
and the amount of instruction code increases accord- 
ingly, resulting in problems such as poor usage effi- 

20 ciency of memory, a decrease of the hit ratio of cache 
memory, and an increase of the load on the instruction 
fetch mechanism. 

[0008] With the super scalar technique, there is 
also a problem that a large-scale circuit is needed to 
25 increase the number of instructors to be executed in 
parallel. 

SUMMARY OF THE INVENTION 

30 [0009] A general object of the present invention is to 
provide parallel processors in which the above disad- 
vantages are eliminated. 

[0010] A more specific object of the present inven- 
tion is to provide a parallel processor that is capable of 

35 performing highly efficient parallel processing. 

[0011] The above objects of the present invention 
are achieved by a parallel processor that performs par- 
allel processing of one or more basic instructions con- 
tained in each of instruction words delimited by 

40 instruction delimiting information, the parallel processor 
comprising: 

a plurality of instruction execution units that perform 
processes corresponding to the supplied basic 

45 instructions in parallel; 

an instruction fetch unit that fetches the instruction 
words one by one in accordance with the instruction 
delimiting information; and 
an instruction issue unit that selectively issues each 

so of the basic instructions supplied from the instruc- 
tion fetch unit to one of the instruction execution 
units to execute the basic instruction. 

[001 2] With the parallel processor having the above 
55 structure, the instruction fetch unit makes each instruc- 
tion word length variable, so that the instruction words 
can be fetched one by one in accordance with the 
instruction delimiting information. Also, the instruction 
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execution units can efficiently execute the instruction 
words, because each of the basic instructions is selec- 
tively issued to a corresponding one of the instruction 
execution units. 

[0013] The above and other objects and features of s 
the present invention will become more apparent from 
the following description taken in conjunction with the 
accompanying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS w 
[0014] 

FIG. 1 shows the structure of a conventional paral- 
lel processor; is 
FIG. 2 shows the formats of instruction words to be 
supplied to a conventional parallel processor hav- 
ing four instruction execution units; 
FIG. 3 shows the structure of a first example of a 
parallel processor in accordance with a first em bod- 20 
iment of the present invention; 
FIG. 4 shows the structures of an instruction fetch 
unit and an instruction issue unit of the parallel 
processor shown in FIG. 3; 

FIG. 5 shows the formats of instruction words to be 25 

supplied to the parallel processor of the first 

embodiment of the present invention; 

FIG. 6 shows the structure of a second example of 

the parallel processor in accordance with the first 

embodiment of the present invention; 30 

FIG. 7 shows the structure of a first example of a 

parallel processor in accordance with a second 

embodiment of the present invention; 

FIG. 8 shows the structures of an instruction fetch 

unit and an instruction issue unit of the parallel as 

processor shown in FIG. 7; 

FIG. 9 illustrates basic instruction rearrangement in 

the parallel processor of the second embodiment of 

the present invention; 

FIG. 10 is a circuit diagram of a conversion unit in ao 
the parallel processor shown in FIG. 7; 
FIG. 1 1 is a circuit diagram of the conversion unit in 
a case where the maximum basic instruction word 
length is 4; 

FIG. 12 shows the structure of a second example of as 
the parallel processor in accordance with the sec- 
ond embodiment of the present invention; 
FIG. 13 shows the structures of an instruction fetch 
unit and an instruction issue unit of the parallel 
processor shown in FIG. 12; so 
FIG. 14 shows the structure of a third example of 
the parallel processor in accordance with the sec- 
ond embodiment of the present invention; 
FIG. 15 shows the structures of an instruction fetch 
unit and an instruction issue unit of the parallel ss 
processor shown in FIG. 14; 
FIG. 16 shows the structure of a fourth example of 
the parallel processor in accordance with the sec- 



ond embodiment of the present invention; 
FIG. 17 shows the structures of an instruction fetch 
unit and an instruction issue unit of the parallel 
processor shown in FIG. 16; 
FIG. 1 8 shows the structure of a fifth example of the 
parallel processor in accordance with the second 
embodiment of the present invention; 
FIG. 1 9 shows the structures of an instruction fetch 
unit and an instruction issue unit of the parallel 
processor shown in FIG. 18; 
FIG. 20 shows the structure of a sixth example of 
the parallel processor in accordance with the sec- 
ond embodiment of the present invention; 
FIG. 21 shows the structures of an instruction fetch 
unit and an instruction issue unit of the parallel 
processor shown in FIG. 20; 
FIG. 22 shows the structure of a first example of a 
parallel processor in accordance with a third 
embodiment of the present invention; 
FIG. 23 shows the structure of a second example of 
the parallel processor in accordance with the third 
embodiment of the present invention; 
FIG. 24 shows the structure of a third example of 
the parallel processor in accordance with the third 
embodiment of the present invention; 
FIG. 25 shows the structure of a fourth example of 
the parallel processor in accordance with the third 
embodiment of the present invention; 
FIG. 26 shows the structure of a fifth example of the 
parallel processor in accordance with the third 
embodiment of the present invention; 
FIG. 27 shows the structure of a sixth example of 
the parallel processor in accordance with the third 
embodiment of the present invention; 
FIG. 28 shows the structure of a first example of a 
parallel processor in accordance with a fourth 
embodiment of the present invention; 
FIG. 29 shows the structure of a second example of 
the parallel processor in accordance with the fourth 
embodiment of the present invention; 
FIG. 30 shows the structure of a third example of 
the parallel processor in accordance with the fourth 
embodiment of the present invention; 
FIG. 31 shows the structure of a fourth example of 
the parallel processor in accordance with the fourth 
embodiment of the present invention; 
FIG. 32 shows the structure of a fifth example of the 
parallel processor in accordance with the fourth 
embodiment of the present invention; 
FIG. 33 shows the structure of a sixth example of 
the parallel processor in accordance with the fourth 
embodiment of the present invention; 
FIG. 34 shows the structure of a first example of a 
parallel processor in accordance with a fifth embod- 
iment of the present invention; 
FIG. 35 shows the structure of a second example of 
the parallel processor in accordance with the fifth 
embodiment of the present invention; 
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FIG. 36 shows the structure of a third example of 

the parallel processor in accordance with the fifth 

embodiment of the present invention; 

FIG. 37 shows the structure of a fourth example of 

the parallel processor in accordance with the fifth s 

embodiment of the present invention; 

FIG. 38 shows the structure of a fifth example of the 

parallel processor in accordance with the fifth 

embodiment of the present invention; and 

FIG. 39 shows the structure of a sixth example of 10 

the parallel processor in accordance with the fifth 

embodiment of the present invention. 

DESCRIPTION OF THE PREFERRED EMBODI- 
MENTS is 

[0015] The following is a description of embodi- 
ments of the present invention, with reference to the 
accompanying drawings. 

20 

[Embodiment 1] 

[0016] FIGS. 3 and 6 show parallel processors 20 
and 21 in accordance with a first embodiment of the 
present invention. The parallel processor 20 comprises 25 
an instruction fetch unit 46 connected to a memory 12, 
an instruction issue unit 72 connected to the instruction 
fetch unit 46, two instruction execution units EUO and 
EU1 having the same structure and connected to the 
instruction issue unit 72, and a register unit 98 con- so 
nected to each of the instruction execution units EUO 
and EU1. Likewise, the parallel processor 21 comprises 
an instruction fetch unit 47 connected to a memory 12, 
an instruction issue unit 73 connected to the instruction 
fetch unit 47, two instruction execution units EUO and 3s 
EU1 having the same structure and connected to the 
Instruction issue unit 73, and a register unit 98 con- 
nected to each of the instruction execution units EUO 
and EU1. 

[0017] rt should be noted that, in the following 40 
description, the maximum basic instruction length of 
one instruction word is 2. However, the parallel proces- 
sor in accordance with the first embodiment should 
operate in the same manner in a case where the maxi- 
mum basic instruction length in one instruction word is 3 45 
or greater. 

(Example 1) 

[001 8] FIG. 4 shows the structure of the instruction so 
fetch unit 46 and the instruction issue unit 72. The 
instruction fetch unit 46 comprises a fetch program 
counter (FPC) 300, adders 324 and 325, an instruction 
buffer 308, a cutting unit 316, and an execution program 
counter (EPC) 339. 55 
[001 9] The FPC 300 is connected to the memory 1 2 
and the instruction execution units EUO and EU1. The 
adder 324 is connected to the FPC 300. The instruction 
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buffer 308 is connected to the memory 1 2, and the cut- 
ting unit 316 is connected to the instruction buffer 308. 
The adder 325 is connected to the cutting unit 31 6, and 
the EPC 339 is connected to the adder 325 and the reg- 
ister unit 98. The FPC 300 receives a fetch address con- 
tained in an instruction word from the memory 12, and 
the instruction buffer 308 receives fetch data contained 
in the instruction word from the memory 12. The FPC 
300 further receives a branch destination address cor- 
responding to a branch instruction from the instruction 
execution units EUO and EU1. 
[0020] On the other hand, the instruction issue unit 
72 comprises an instruction register 347, selectors 355 
and 356, a control unit 370, and an AND gate 378. Here, 
the instruction register 347 is connected to the cutting 
unit 316. The selectors 355 and 356 are both connected 
to the instruction register 347. The selector 355 is con- 
nected to the instruction execution unit EUO, while the 
selector 356 is connected to the instruction execution 
unit EU1 . The control unit 370 is connected to the AND 
gate 378 and the selectors 355 and 356. The AND gate 
is connected to the instruction execution units EUO and 
EU1. In this structure, the instruction execution units 
EUO and EU1 transmit execution complete signals 
EUcO and EUc1, respectively, to the AND gate 378. 
[0021] FIG. 5 shows the formats of instruction 
words to be supplied to the parallel processors of the 
first embodiment. Each instruction word is made up of 
one or more basic instructions El and at least one of 
instruction word delimiting fields 0 and 1. The basic 
instruction word length is either 1 or 2. The upper row of 
FIG. 5 indicates an instruction word having a basic 
instruction word length of 2, consisting of a basic 
instruction word made up of an instruction word delimit- 
ing field 0 and a basic instruction El, and another basic 
instruction word made up of an instruction word delimit- 
ing filed 1 and a basic instruction El. The lower row of 
FIG. 5 indicates an instruction word having a basic 
instruction word length of 1 , consisting of an instruction 
word delimiting field 1 and a basic instruction El. 
[0022] The above instruction words are stored in 
the memory 12 in advance. The adder 324 in the 
instruction fetch unit 46 of the parallel processor 20 
increments the address by a fixed length DISP, so that 
the instruction words can be fetched from the memory 
12 in order. When the cutting unit 316 in the instruction 
fetch unit 46 fetches the instruction word of the upper 
row of FIG. 5, for instance, it recognizes the instruction 
word delimiting field and the following basic instruction 
El as one instruction word. The cutting unit 316 then 
cuts the instruction word from the instruction word 
string, and stores it in the instruction fetch unit 46. The 
adder 325 calculates the address corresponding to the 
basic instruction El to be executed in accordance with 
an instruction word length signal SL supplied from the 
cutting unit 316. The calculated address is temporarily 
stored in the EPC 339. A return address for rerunning 
the basic instruction El that is stored in the EPC 339 is 
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supplied to the register unit 98. 
[0023] Based on the instruction word delimiting 
fields 0 and 1 contained in the instruction words sup- 
plied from the cutting unit 31 6 V the instruction issue unit 
72 recognizes each basic instruction El, and issues s 
each basic instruction El selectively to one of the 
instruction execution units EUO and EU1 via the selec- 
tors 355 and 356. Accordingly, if a basic instruction El 
following an instruction word delimiting field 0 is issued 
to the instruction execution unit EUO, while a basic 10 
instruction El following an instruction word delimiting 
field 1 is issued to the instruction execution unit EU1. 
The selectors 355 and 356 are controlled by the control 
unit 370. When the execution of one instruction word is 
completed, the corresponding basic instruction El is is 
supplied to the instruction execution units EUO and EU1 
via the selectors 355 and 356. 
[0024] Likewise, in a case where the instruction 
fetch unit 46 fetches and then supplies the instruction 
word having the basic instruction word length of 1 to the 20 
instruction buffer unit 308, the cutting unit 316 cuts the 
basic instruction El that follows the instruction word 
delimiting field 1 from the rest of the instruction word. 
The instruction register 347 then issues the basic 
instruction El to one of the instruction execution units 25 
EUO and EU1. 

[0025] The instruction word delimiting fields 0 and 1 
are both represented by one bit, but any sort of data can 
be written in those fields as long as they can function to 
delimit the instruction words. In this example, the two 30 
instruction execution units EUO and EU1 having the 
same structure are employed, but it is also possible to 
employ three or more instruction execution units. 
[0026] As described so far, in the parallel processor 
20 of this example, the instruction fetch unit 46 fetches 35 
instruction words one by one in accordance with the 
instruction word delimiting fields 0 and 1, so that the 
length of each of the instruction words can be made var- 
iable. The instruction issue unit 72 then issues a basic 
instruction El to a corresponding one of the instruction 40 
execution units EUO and EU1. Accordingly, there is no 
need to include do-nothing instructions NOP in any 
instruction word, and basic instructions El can be effi- 
ciently included in each instruction word. By executing 
the basic instructions El in the instruction words, the as 
parallel processing performance of the parallel proces- 
sor can be improved. 

(Example 2) 

50 

[0027] FIG. 6 shows the structure of a second 
example of the parallel processor 21 in accordance with 
the first embodiment of the present invention. As shown 
in FIG. 6, the parallel processor 21 has the same struc- 
ture as the parallel processor 20 shown in FIG. 3, 55 
except for a judgment unit 1 03 that determines whether 
or not each basic instruction El supplied to the instruc- 
tion issue unit 73 has data dependence or control 



dependence with a basic instruction El being executed 
by one of the instruction execution units EUO and EU1, 
and whether or not each basic instruction El shares one 
resource with another basic instruction El being exe- 
cuted by one of the instruction execution units EUO and 
EU1. 

[0028] The judgment unit 103 compares a destina- 
tion register number (write register number) defined in a 
basic instruction El in execution with a source register 
number (read register number) defined in a basic 
instruction El to be issued to one of the instruction exe- 
cution units EUO and EU1. If the destination register 
number coincides with the source register number, it is 
confirmed that there is data dependence between the 
two basic instructions EL If the destination register 
number does not coincide with the source register 
number, it is confirmed that there is no data depend- 
ence between the two basic instructions El, and the 
operation can proceed. 

[0029] The judgment unit 103 also determines 
whether or not the basic instruction El in execution con- 
tains a branch instruction, and whether or not the basic 
instruction El has a possibility of starting an irregular 
process such as a division by 0. rf the basic instruction 
El in execution contains a branch instruction or has a 
possibility of an irregular process, there is control 
dependence between the basic instruction El in execu- 
tion and the basic instruction El to be issued to the 
instruction execution unit EUO or EU1. If the basic 
instruction El in execution neither contains a branch 
instruction nor has a possibility of an irregular process, 
it is confirmed that there is no control dependency 
between the two basic instructions El. 
[0030] Based on the contents of each basic instruc- 
tion El, the judgment unit 103 also compares the 
resource (the instruction execution units EUO and EU1, 
for instance) required by the basic instruction El in exe- 
cution with the resource required by the basic instruc- 
tion El to be issued. If the resource required by the bask: 
instruction El in execution is the same as the resource 
required by the basic instruction El to be issued, there is 
resource sharing between the two basic instructions El. 
If the resources are different, it is confirmed that there is 
no resource sharing between the two basic instructions 
El. 

[0031] If the basic instruction El to be issued has 
neither data dependency nor control dependency, and 
causes no resource sharing with the basic instruction El 
being executed by the instruction execution units EUO 
and EU1, the instruction issue unit 73 issues the basic 
instruction El to a corresponding one of the instruction 
execution units EUO and EU1 before the end of the exe- 
cution. Here, the instruction issuance by the instruction 
issue unit 73 and the instruction execution by the 
instruction execution units EUO and EU1 are processed 
by time-sharing parallel processing. 
[0032] On the other hand, if the basic instruction El 
to be issued has data dependency anoVor control 
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dependency, and/or causes resource storing with the 
bask: instruction El being executed by the instruction 
execution units EUO and EU1 , the basic instruction El is 
issued to a corresponding one of the instruction execu- 
tion units EUO and EU1 after the end of the execution. 
[0033] Although the two instruction execution units 
EUO and EU1 having the same structure are employed 
in this example, it is also possible to employ three or 
more instruction execution units. 
[0034] As described so far, the parallel processor 
21 of this example can have the same effects as the par- 
allel processor 20 of Example 1, and efficiently and 
accurately performs the parallel processing of the basic 
instructions El Thus, more reliable operations can be 
achieved. 

[Second Embodiment] 

[0035] FIGS. 7, 12, 14, 16, 1 8, and 20 show parallel 
processors 22 to 27 in accordance with a second 
embodiment of the present invention. Each of the paral- 
lel processors 22-27 comprises an instruction fetch unit 
48-53 connected to a memory 12, an instruction issue 
unit 74-79 connected to the instruction fetch unit 48-53, 
instruction execution units LUO, IU0, IU1, FUO, FU1, 
and BUO connected to the instruction issue unit 74-79, 
and a register unit 99 connected to all the instruction 
execution units LUO, IU0, IU1, FUO, FU1, and BUO. 
[0036] The instruction execution unit LUO is a load 
store instruction execution unit that executes a load 
instruction and a store instruction. After the execution of 
these instructions, the instruction execution unit LUO 
notifies the instruction issue unit 74-79 of the end of the 
execution. The instruction execution units IU0 and IU1 
are integer arithmetic instruction execution units that 
execute integer arithmetic instructions. When the exe- 
cution of the integer arithmetic instructions is com- 
pleted, the instruction execution units IU0 and IU1 notify 
the instruction issue unit 74-79 of the end of the execu- 
tion. 

[0037] The instruction execution units FUO and FU1 
are floating-point arithmetic instruction execution units 
that execute floating-point arithmetic instructions. When 
the execution of the floating-point arithmetic instructions 
is completed, the instruction execution units FUO and 
FU1 notify the instruction issue unit 74-79 of the end of 
the execution. The instruction execution unit BUO is a 
branch instruction execution unit that executes a branch 
instruction. When the execution of the branch instruc- 
tion is completed, the instruction execution unit BUO 
notifies the instruction issue unit 74-79 of the end of the 
execution. 

[0038] In the following examples, the maximum 
basic instruction word length contained in one instruc- 
tion word is 2, but the same effects can be expected in 
a case where the maximum basic instruction word 
length is 3 or greater. 



(Example 1) 

[0039] FIG. 7 shows the structure of a first example 
of the parallel processor in accordance with the second 

5 embodiment of the present invention. As shown in FIG. 
7, the parallel processor 22 comprises a conversion unit 
1 15 in the instruction fetch unit 48. The conversion unit 
115 rearranges basic instructions contained in one 
fetched instruction word in accordance with the struc- 

10 ture of the instruction execution units LUO, IU0, IU1, 
FUO, FU1, and BUO, and then supplies the rearranged 
basic instructions to the instruction issue unit 74. This 
rearrangement by the conversion unit 1 15 facilitates the 
issuance of the basic instructions of the instruction 

is issue unit 74. 

[0040] More specifically, the parallel processor of 
the present invention is embodied on a printed board or 
an LSI circuit The components are arranged on a two- 
dimensional surface and connected by wires. At this 

20 point, the wires might cross each other. However, a 
printed board and an LSI circuit have a plurality of wiring 
layers, so that any two wires that might cross each other 
can be arranged on two different wiring layers. Logically, 
it is possible to place wires in any desired arrangement. 

25 In view of the operation speed of the circuit, however, 
the above alternate wiring (arranging wires on different 
wiring layers) requires longer wires, which will decrease 
the operation speed. Therefore, it is preferable to have 
less alternate wiring. Shorter wires will facilitate the 

30 issuance of the basic instruction of the instruction issue 
unit 74, and increase the operation speed. 
[0041 ] FIG. 8 shows the structures of the instruction 
fetch unit 48 and the instruction issue unit 74 of the par- 
allel processor 22 shown in FIG. 7. The instruction fetch 

35 unit 48 and the instruction issue unit 74 have the same 
structures as the instruction fetch unit 46 and the 
instruction issue unit 72 shown in FIG. 4, except that the 
instruction fetch unit 48 includes the conversion unit 1 1 5 
connected to a cutting unit 317. The instruction execu- 

40 tion unit BUO supplies information, such as a branch 
destination address corresponding to a branch instruc- 
tion, to a FPC 301 . 

[0042] For simplification of the drawing, only two 
instruction passages from an instruction register 348 to 
45 the two instruction execution units LUO and LU1 are 
shown in FIG. 8. However, it should be understood that 
there are the other instruction passages to the instruc- 
tion execution units IU1, FUO, FU1, and BUO, as shown 
in FIG. 7. 

so [0043] The parallel processor 22 of this example 
operates in the following manner. First, the cutting unit 
317 of the instruction fetch unit 48 fetches instruction 
words one by one. The formats 13 of the instruction 
words to be supplied to the instruction fetch unit 48 are 

55 shown in FIG. 9. As shown in FIG. 9, each of the instruc- 
tion words includes an instruction word delimiting field 0 
and/or an instruction word delimiting field 1 and one or 
two instructions selected from the group consisting of 
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an integer arithmetic instruction II, a floating-point arith- 
metic instruction Fl, a load store instruction LI, and a 
branch instruction Bl. 

[0044] An interface 15 for the instruction execution 
units LUO, IU0, IU1, FUO, FU1 and BUO, includes effec- 
tive bits V, information II required for executing an inte- 
ger arithmetic instruction, information Fl required for 
executing a floating-point arithmetic instruction, infor- 
mation LI required for executing a load store instruction, 
and information Bl required for executing a branch 
instruction. The interface 15 supplies the effective bit V 
and the information LI from the instruction issue unit 74 
to the instruction execution unit LUO, the effective bit V 
and the information II to the instruction execution units 
IU0 and IU1 , the effective bit V and the information Fl to 
the instruction execution units FUO and FU1 , and the 
effective bit V and the information Bl to the instruction 
execution unit BUO. 

[0045] When the effective bit V is 0, no basic 
instruction is issued, and when the effective bit 1, a 
basic instruction is issued. Each effective bit V is cou- 
pled with the information II, Fl, LI, or B), and is then allo- 
cated to each corresponding instruction execution unit. 
[0046] As shown in FIG. 9, the instruction word for- 
mats 13 are rearranged and converted into instruction 
word formats 17 by the conversion unit 115 in the 
instruction fetch unit 48. The instruction word formats 1 7 
correspond to the instruction execution units LUO, IU0, 
IU1, FUO, FU1, and BUO, and are supplied to the 
instruction register 348 in the instruction issue unit 74. 
The instruction register 348 issues basic instructions 
each having the effective bit V of 1 to corresponding 
instruction execution units. For instance, when the 
instruction word on the uppermost row of the instruction 
word format 17 is supplied to the instruction issue unit 
74, the instruction issue unit 74 issues the floating-point 
arithmetic instruction Fl provided with "1" as the effec- 
tive bit V to the instruction execution unit FUO, and the 
branch instruction Bl also provided with "1 " as the effec- 
tive bit V to the instruction execution unit BUO. 
[0047] As a result, the instruction execution unit 
FUO executes the floating-point arithmetic instruction Fl, 
and the instruction execution unit BUO executes the 
branch instruction Bl. In this case, no basic instructions 
are executed by the other instruction execution units 
LUO, IU0, IU1,andFU1. 

[0048] FIG. 1 0 is a circuit diagram of the conversion 
unit 115 shown in FIG. 8. As shown in FIG. 10, the con- 
version unit 115 comprises transmission lines L1 and 
L2, Bl detectors BD1 and BD2, Fl detectors FD1 and 
FD2, II detectors ID1 and ID2, LI detectors LD1 and 
LD2, buffers 155 to 158, AND gates 163 to 166, 185, 
and 186, exclusive OR gates 187 to 190 selectors 209 
to 212, and OR gates 1 99 to 202. 
[0049] The transmission line L1 transmits the first 
basic instruction contained in each instruction word, 
and the transmission line L2 transmits the second baste 
instruction contained in each instruction word. The Bl 



detector BD1 is connected to the transmission line L1, 
and the Bl detector BD2 is connected to the transmis- 
sion line L2. The buffer 1 55 is connected to the B I detec- 
tor BD1 , and the AND gate 1 63 is connected to the Bl 

5 detectors BD1 and BD2. The selector 209 is connected 
to the transmission lines L1 and L2, the buffer 155, and 
the AND gate 1 63. The OR gate 1 99 is connected to the 
buffer 155 and the AND gate 163. 
[0050] The Fl detector FD1 is connected to the 

io transmission line L1, and the Fl detector FD2 is con- 
nected to the transmission line L2. The buffer 156 is 
connected to the Fl detector FD1, and the AND gate 

1 64 is connected to the Fl detectors FD1 and FD2. The 
two input terminals of the exclusive OR gate 187 are 

is connected to the input node and the output node, 
respectively, of the buffer 156. The two input terminals 
of the exclusive IR gate 1 88 are connected to the output 
node of the AND gate 164 and the Fl detector FD2, 
respectively. The AND gate 1 85 is connected to the two 

20 exclusive OR gates 187 and 188. The selector 210 is 
connected to the transmission lines L1 and L2, the 
buffer 156, and the AND gate 164. The OR gate 200 is 
connected to the buffer 156 and the AND gate 164. 
[0051 ] The II detector ID1 is connected to the trans- 

25 mission line L1 , and the II detector ID2 is connected to 
the transmission line L2. The buffer 157 is connected to 
the II detector ID1, and the AND gate 165 is connected 
to the II detectors ID1 and ID2. The two input terminals 
of the exclusive OR gate 1 89 are connected to the input 

30 node and the output node, respectively, to the buffer 
157. The two input terminals of the exclusive OR gate 
190 are connected to the output node of the AND gate 

165 and the II detector 1D2, respectively. The AND gate 
1 86 is connected to the two exclusive OR gates 1 89 and 

as 1 90. The selector 21 1 is connected to the transmission 
lines L1 and L2, the buffer 157, and the AND gate 165. 
The OR gate 201 is connected to the buffer 157 and the 
AND gate 165. 

[0052] The LI detector LD1 is connected to the 
40 transmission line L1, and the LI detector LD2 is con- 
nected to the transmission line L2. The buffer 158 is 
connected to the LI detector LD1 , and the AND gate 1 66 
is connected to the LI detectors LD1 and LD2. The 
selector 212 is connected to the transmission lines L1 
45 and L2, the buffer 158, and the AND gate 1 66. The OR 
gate 202 is connected to the buffer 158 and the AND 
gate 166. 

[0053] The two Bl detectors BD1 and BD2 consti- 
tute a Bl detector block 147. The two Fl detectors FD1 
so and F2 constitute an Fl detector block 149. The two II 
detectors ID1 and 1D2 constitute an II detector block 
151 . The two LI detectors LD1 and LD2 constitute an LI 
detector block 153. 

[0054] In the following, an operation of the conver- 
55 sion unit 115 having the above structure will be 
described by way of an example case where the instruc- 
tion word including the basic Instructions Bl and Fl on 
the uppermost row of the instruction word formats 13 
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shown in FIG. 9 is supplied to the conversion unit 115. 
First the basic instruction Bl is transmitted through the 
transmission line L1. The Bl detector BD1 then detects 
the base instruction Bl and supplies a detection signal 
of logic 1 to the buffer 155. At this point, the AND gate 
163 outputs a logic 0 signal. In accordance with the 
detection signal supplied from the buffer 155, the selec- 
tor 209 selects the first basic instruction Bl and outputs 
the first basic instruction Bl, that is, an instruction to be 
executed by the instruction execution unit BUO, to the 
instruction issue unit 74. At the same time as the output 
of the basic instruction Bl, in accordance with the detec- 
tion signal supplied from the buffer 155, the OR gate 
199 outputs the effective bit V of logic 1. As the first 
basic instruction Bl is detected, the Fl detector FD1 , the 
II detector ID1, and the LI detector LD1 output non- 
detection signals of logic 0. Accordingly, the selectors 
210, 21 1, and 212 do not select the first basic instruc- 
tion transmitted through the transmission line L1 . 
[0055] Next, the second basic instruction Fl in the 
instruction word is transmitted through the transmission 
line 12. As in the case of the first basic instruction Bl, 
The Fl detector FD2 detects the second basic instruc- 
tion Fl and supplies a detection signal of logic 1 to the 
AND gate 1 64. The AND gate 1 64 in turn outputs a logic 
1 signal. In accordance with the logic 1 signal supplied 
from the AND gate 164, the selector 210 selects the 
second basic instruction Fl and outputs the second 
basic instruction Fl as an instruction to be executed by 
the instruction execution unit FUO. At the same time as 
the output of the basic instruction Fl, the OR gate 200 
outputs the effective bit V of logic 1 in accordance with 
the detection signal supplied from the AND gate 164. 
[0056] As the second basic instruction Fl is 
detected, the Bl detector BD2, the II detector ID2, and 
the LI detector LD2 output non-detection signals of logic 
0. Accordingly, the selectors 209, 211, and 212 do not 
select the second basic instruction transmitted through 
the transmission fine L2. Since neither first nor second 
basic instructions to be executed by the instruction exe- 
cuted units LUO, IU0, IU1, and FU1 are detected, the 
effective bet V of logic 0 is outputted from each of the 
OR gates 201 and 202, and the AND gates 185 and 
186. 

[0057] In the above described manner, the conver- 
sion unit 115 converts the instruction word formats 13 
into the instruction word formats 17, as shown in FIG. 9. 
[0058] FIG. 1 1 is a circuit diagram of the conversion 
unit 1 1 5 in a case where the maximum basic instruction 
word length of one instruction word to be supplied from 
the memory 12 to the instruction fetch unit 48 is 4. As 
shown in FIG. 11, the structure of the conversion unit 
1 15 in this case is the same as the structure of the con- 
version unit 115 shown in FIG. 10, except that the 
number of transmission lines are 4, the number of Bl 
detectors is 4, the number of Fl detectors is 4, the 
number of II detectors is 4, and the number of LI detec- 
tors is 4. Also, two selectors 214 and 215 are provided 



for a basic instruction Fl, and two selectors 21 6 and 21 7 
are provided for a basic instruction II in this case. 
[0059] The conversion unit 115 further includes 
buffers 1 59 to 1 62, AND gates 1 67 to 1 84, exclusive OR 

5 gates 191 to 198, OR gates 203 to 208, and selectors 
213 and 218. The four Bl detectors BD1 to BD4 consti- 
tute a Bl detector block 148. The four Fl detectors FD1 
to FD4 constitute an Fl detector block 150. The four II 
detectors ID1 to 1D4 constitute an ID detector block 

10 152. The four LI detectors LD1 to LD4 constitute an LI 
detector block 154. 

[0060] The conversion unit 115 having the above 
structure operates in the same manner as the conver- 
sion unit 115 shown in FIG. 1 0. In the following, an oper- 

15 ation of the conversion unit 115 in a case where an 
instruction word made up of basic instructions Bl, Fl, Fl, 
and II is supplied to the conversion unit 115 will be 
described. First, the first basic instruction Bl is transmit- 
ted through the transmission line L1 . The Bl detector 

20 BD1 then detects the basic instruction Bl and supplies a 
detection signal of logic 1 to the buffer 1 59. At this point, 
each of the AND gates 1 67 to 1 69 outputs a logic 0 sig- 
nal. In accordance with the detection signal supplied 
from the buffer 159, the selector 213 selects the first 

25 basic instruction Bl and outputs the first basic instruc- 
tion Bl, that is an instruction to be executed by the 
instruction execution unit BUO, to the instruction issue 
unit 74. At the same time as the output of the first basic 
instruction Bl, the OR gates 203 outputs the effective bit 

30 V of logic 1 in accordance with the detection signal sup- 
plied from the buffer 1 59. As the first basic instruction Bl 
is detected, the Fl detector FD1 , the II detector ID1 , and 
the LI detector LD1 output non-detection signal of logic 
0. Accordingly, the selectors 214, 216, and 218 do not 

35 select the first basic instruction Bl transmitted through 
the transmission line L1 . 

[0061] Next, the second basic instruction Fl is 
transmitted on the transmission line L2. The Fl detector 
FD2 then detects the second basic instruction Fl and 

40 supplies a detection signal of logic 1 to the AND gate 
170. The AND gate 170 in turn outputs a logic 1 signal. 
In accordance with the logic 1 signal supplied from the 
AND gate 170, the selector 214 selects the second 
basic instruction Fl and outputs the second basic 

45 instruction Fl as an instruction to be executed by the 
instruction execution unit FUO. At the same time as the 
output of the second basic instruction Fl, the OR gate 
204 outputs the effective bit V of logic 1 in accordance 
with the detection signal supplied from the AND gate 

so 170. 

[0062] As the second basic instruction Fl is 
detected, the Bl detector BD2, the II detector ID2, and 
the LI detector LD2 each output a non-detection signal 
of logic 0. Accordingly, the selectors 213, 216, and 218 
55 do not select the second basic instruction Fl transmitted 
through the transmission line 12. 
[0063] Next, the third basic instruction Fl is trans- 
mitted through the transmission line L3. The Fl detector 
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FD3 then detects the third basic instruction Fl and sup- 
plies a detection signal of logic level 1 to the AND gate 
171. Since the AND gate 171 has already received a 
detection signal of logic 1 from the Fl detector FD2 at 
this point, the output of the AND gate 171 is a logic 0 5 
signal. Because of that, the exclusive OR gate 1 93 out- 
puts a logic 1 signal, and the AND gate 1 74 also outputs 
a logic 1 signal. In accordance with the logic 1 signaJ 
supplied from the AND gate 174, the selector 215 
selects the third basic instruction Fl and outputs the w 
third basic instruction Fl as an instruction to be exe- 
cuted by the instruction execution unit FU1 . At the same 
time as the output of the third basic instruction Fl, the 
OR gate 205 outputs the effective bit V of logic 1 in 
accordance with the signal supplied from the AND gate 15 
174. 

[0064] As the third basic instruction Fl is detected, 
the Bl detector BD3, the II detector ID3, and the U 
detector LD3 each output a non-detection signal of logic 
0. Accordingly, the selectors 213, 216, and 218 do not 20 
select the third basic instruction Fl transmitted through 
the transmission line 3. 

[0065] Next, the fourth basic instruction II of the 
instruction word is transmitted through the transmission 
line L4. The II detector ID4 then detects the fourth basic 25 
instruction II and supplies a detection signal of logic 1 to 
the AND gate 178. The AND gate 178 in turn outputs a 
logic 1 signal. In accordance with the logic 1 signal sup- 
plied from the AND gate 178, the selector 216 selects 
the fourth basic instruction II and outputs the fourth 30 
basic instruction II as an instruction to be executed by 
the instruction execution unit IU0. At the same time as 
the output of the fourth basic instruction II, the OR gate 
206 outputs the effective bit V of logic 1 in accordance 
with the signal supplied from the AND gate 1 78. as 
[0066] As described above, in the parallel proces- 
sor of this example, basic instructions contained in each 
instruction word supplied to the instruction fetch unit 48 
are rearranged in accordance with the arrangement of 
the instruction execution units, so that the instruction 40 
issue unit 74 can smoothly issue the basic instructions 
to the respective instruction execution units. Thus, the 
entire operation speed can be increased. 
[0067] In this example, the instruction fetch unit 48 
can also fetch an instruction word containing basic 45 
instructions that have already been arranged in accord- 
ance with the arrangement of the instruction execution 
units in advance. In such a case, the basic instruction 
are arranged in advance so that the circuit size required 
for rearranging the basic instructions in the instruction 50 
fetch unit 48 can be reduced. 

[0068] More specifically, when there are two 
instructions for the same function, only one of the two 
instructions is employed. For instance, the instruction 
word on the uppermost row and the instruction word on ss 
the fourth row from the top of the formats 13 in FIG. 9 
are rearranged into the same formats in the formats 17. 
In this case, only one of the two instruction words 
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should be employed, while the use of the other should 
be inhibited. Aftematively, an instruction word that will 
increase the number of alternate wire routes in the 
instruction fetch unit 48 may be inhibited beforehand. 
For instance, the instruction words on the upper most 
row and the fourth row from the top of the formats 13 in 
FIG. 9 have the basic instructions Bl and Fl in the oppo- 
site orders. Since the circuit components are arranged 
on a two-dimensional surface, one of the two basic 
instructions requires more alternate wire routes than the 
other. Accordingly, the instruction word that requires 
more alternate wire routes should be inhibited in 
advance. 

[0069] As described so far, the circuit size of the 
parallel processor 22 can be reduced by restricting in 
advance the arrangement of basic instruction contained 
in each instruction word to be supplied to the instruction 
fetch unit 48. 

(Example 2) 

[0070] FIG. 12 shows the structure of a second 
example of the parallel processor in accordance with 
the second embodiment of the present invention. As 
shown in FIG. 12, the parallel processor 23 of this 
example has the same structure as the parallel proces- 
sor 22 of Example 1 , except that a conversion unit 1 1 6 
is included in the instruction issue unit 75. The conver- 
sion unit 116 has the same structure and functions as 
the conversion unit 115 shown in FIGS. 10 and 11. 
[0071] FIG. 13 shows the structures of the instruc- 
tion fetch unit 49 and the instruction issue unit 75 of the 
parallel processor 23 shown in FIG. 12. The instruction 
fetch unit 49 and the instruction issue unit 75 has the 
same structures as the instruction fetch unit 46 and the 
instruction issue unit 72 shown in FIG. 4, except that the 
instruction issue unit 75 includes the conversion unit 
116 connected to an instruction register 349. For simpli- 
fication of the drawing, only the instruction passages to 
the two instruction execution units LUO and IU0 are 
shown, and the instruction passages to the other 
instruction execution units IU1, FUO, FU1, and BUO are 
omitted in FIG. 13. Also, only two execution complete 
signals LUc and IUcO to be supplied to the AND gate 
380 are shown, and the other execution complete sig- 
nals are omitted in RG. 13. 

[0072] With the parallel processor of this example, 
basic instructions contained in each instruction word 
supplied from the instruction register 349 are rear- 
ranged by the conversion unit 116 in accordance with 
the arrangement of the instruction execution units. The 
rearranged basic instructions are then issued to the cor- 
responding instruction execution units. Thus, the wires 
can be shortened as a whole, and the operation speed 
can be increased. 

[0073] Also, the arrangement of basic instruction 
contained in each instruction word to be supplied to the 
instruction fetch unit 49 can be restricted in advance in 
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the same manner as in Example 1 . Thus, the circuit size 
of the parallel processor 23 can be reduced. 

(Example 3) 

5 

[0074] FIG. 1 4 shows the structure of a third exam- 
ple of the parallel processor in accordance with the sec- 
ond embodiment of the present invention. As shown in 
FIG. 14, the parallel processor 24 has the same struc- 
ture as the parallel processor 22 of Example 1 shown in 10 
FIG. 7 f except that the instruction fetch unit 50 includes 
a first conversion unit 117 and the instruction issue unit 
76 includes a second conversion unit 1 1 8. The first con- 
version unit 1 1 7 rearranges basic instructions contained 
in each instruction word in accordance with the arrange- is 
ment of the instruction execution units. The second con- 
version unit 1 1 8 rearranges basic instructions contained 
in each instruction word in accordance with the arrange- 
ment of the instruction execution units. 
[0075] RG. 15 shows the structures of the instruc- 20 
tion fetch unit 50 and the instruction issue unit 76 of the 
parallel processor unit 24 shown in FIG. 14. The instruc- 
tion fetch unit 50 and the instruction issue unit 76 have 
the same structures as the instruction fetch unit 46 and 
the instruction issue unit 72 shown in FIG. 4, except that 25 
the instruction fetch unit 50 further includes the first con- 
version unit 117 connected to a cutting unit 31 9 and the 
instruction issue unit 76 further includes the second 
conversion unit 118 connected to an instruction register 
350. For simplification of the drawing, only the instruc- 30 
tion passages from the second conversion unit 118 to 
the two instruction execution units LUO and IU0 are 
shown, and the instruction passages to the other 
instruction execution units IU1, FUO, FU1, and BUO are 
omitted in FIG. 15. Likewise, only two execution com- 3s 
plete signals LUc and lUcO to be supplied to the AND 
gate 381 are shown, and the other execution complete 
signals are omitted in FIG. 15. 
[0076] The first conversion unit 1 17 performs "pre- 
processing" of the rearrangement of basic instructions. 40 
The second conversion unit 1 1 8 performs "postprocess- 
ing" of the rearrangement of basic instructions. 
[0077] In an actual circuit, the processes performed 
by the instruction fetch unit 50 and the instruction issue 
unit 76 are pipelined so as to improve the performance 45 
of the parallel processor. Because of that, the difference 
in processing time between instruction fetch unit 50 and 
the instruction issue unit 76 should be as small as pos- 
sible to optimize the pipeline effects. Therefore, the 
arrangement process is divided into the "preprocessing" so 
and "postprocessing", so that the difference in process- 
ing time between the instruction fetch unit 50 and the 
instruction issue unit 76 is small. 
[0078] More specifically, the first conversion unit 
1 1 7 includes circuits that are the counterparts of the Bl ss 
detector block 147 or 148, the Fl detector block 149 or 
1 50, the II detector block 1 51 or 1 52, and the LI detector 
block 153 or 154 shown in FIGS. 10 and 11. The other 
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circuits shown in FIGS. 10 and 11 are included in the 
second conversion unit 118. 

[0079] With the parallel processor 24 having the 
above structure, the wires can be shortened as a whole, 
and the operation speed can be reduced. 
[0080] Also, as in Examples 1 and 2, the circuit size 
of the parallel processor 24 may be reduced by restrict- 
ing in advance the arrangement of basic instructions 
contained in each instruction word to be supplied to the 
instruction fetch unit 50. 

(Example 4) 

[0081] FIG. 16 shows the structure of a fourth 
example of the parallel processor in accordance with 
the second embodiment of the present invention. As 
shown in FIG. 16, the parallel processor 25 has the 
same structure as the parallel processor 22 of Example 
1 shown in FIG. 7, except that the instruction fetch unit 
51 includes a conversion unit 119 and the instruction 
issue unit 77 includes a judgment unit 1 04. 
[0082] FIG. 1 7 shows the structures of the instruc- 
tion fetch unit 51 and the instruction issue unit 77 of the 
parallel processor 25 shown in FIG. 16. The instruction 
fetch unit 51 and the instruction issue unit 77 have the 
same structures as the instruction fetch unit 48 and the 
instruction issue unit 74 shown in FIG. 8, except that the 
instruction issue unit 77 further includes the judgment 
unit 104. The judgment unit 104 determines whether or 
not a basic instruction to be issued has data depend- 
ency or control dependency with a supplied basic 
instruction. The judgment unit 104 also determines 
whether or not the basic instruction to be issued shares 
resources with the supplied basic instruction, ff the 
baste instruction to be issued has data dependency or 
control dependency, or shares resources with the sup- 
plied basic instruction, the instruction issue unit 77 
issues the basic instruction after the execution complete 
signals LUc and lUcO are supplied. 
[0083] For simplification of the drawing, only the 
instruction passages from an instruction register 351 to 
the two instruction execution units LUO and IU0 are 
shown, and the other instruction passages to the 
instruction execution units IU1, FUO, FU1, and BUO are 
omitted in FIG. 17. Likewise, only the two execution 
complete signals LUc and lUcO are shown as signals to 
be supplied to the judgment unit 104, but the other exe- 
cution complete signals are omitted in FIG. 17. 
[0084] The structure and operation of the conver- 
sion unit 1 1 9 are substantially the same as the structure 
and operation of the conversion unit 15 shown in FIGS. 
1 0 and 1 1 . The structure and operation of the judgment 
unit 1 04 are substantially the same as the structure and 
operation of the judgment unit 103 shown in FIG. 6. 
[0085] By the parallel processor of this example 
having the above structure, the same effects as 
obtained by the parallel processor of Example 2 of the 
first embodiment and the parallel processor of Example 
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1 of the second embodiment can be obtained. In the 
parallel processor of this example, the instruction issue 
unit 77, which includes the judgment unit 1 04, enables 
accurate and efficient parallel processing of basic 
instructions, thereby increasing the reliability of the par- 5 
allel processor. Also, the instruction fetch unit 51 , which 
includes the conversion unit 119, facilitates the basic 
instruction issuance to the instruction execution units by 
the instruction issue unit 77, thereby increasing the 
operation speed. 10 
[0086] As in the foregoing examples, the circuit size 
of the parallel processor 25 may be reduced by restrict' 
ing in advance the arrangement of basic instructions 
contained in each instruction word to be supplied to the 
instruction fetch unit 51 . 15 

(Example 5) 

[0087] FIG. 18 shows the structure of a fifth exam- 
ple of the parallel processor in accordance with the sec- 20 
ond embodiment of the present invention. As shown in 
FIG. 18, the parallel processor 26 has the same struc- 
ture as the parallel processor 25 of Example 4, except 
that the instruction fetch unit 52 includes no conversion 
unit and the instruction issue unit 78 further includes a 25 
conversion unit 120. 

[0088] FIG. 19 shows the structures of the instruc- 
tion fetch unit 52 and the instruction issue unit 78 of the 
parallel processor 26 shown in FIG. 18. The instruction 
fetch unit 52 and the instruction issue unit 78 have the 30 
same structures as the instruction fetch unit 49 and the 
instruction issue unit 75 shown in FIG. 13, except that 
the instruction issue unit 78 further includes the judg- 
ment unit 105 connected between an instruction regis- 
ter 352 and a control unit 375. In accordance with a as 
supplied basic instruction, the judgment unit 105 deter- 
mines whether or not a basic instruction to be issued 
has the data dependency or control dependency, and 
whether or not the basic instruction to be issued will 
cause resource sharing. The judgment results are 40 
reported to the control unit 375. If the basic instruction 
to be issued has the data dependency or control 
dependency, or causes resource sharing, the issue 
instruction unit 78 issues the basic instruction after the 
supply of the execution complete signals LUc and IUcO. 45 
[0089] For simplification of the drawing, only the 
instruction passages from the instruction register 352 to 
the two instruction execution units LUO and IU0 are 
shown, and the instruction passages to the other 
instruction execution units are omitted in FIG. 19. Like- so 
wise, only the two execution complete signals LUc and 
IUcO to be supplied to the judgment unit 105 are shown 
in FIG. 19. 

[0090] The structure and operation of the conver- 
sion unit 120 are the same as the structure and opera- ss 
tion of the conversion unit 115 shown in FIGS. 10 and 
1 1 . The structure and operation of the judgment unit 
105 are the same as the structure and operation of the 



judgment unit 104 shown in FIG. 16. 
[0091] The parallel processor of this example hav- 
ing the above structure achieves the same effects as the 
parallel processor of Example 4. The instruction issue 
unit 78 including the judgment unit 105 enables accu- 
rate and efficient parallel processing of basic instruc- 
tions, thereby increasing the reliability of the operation. 
Also, the instruction issue unit 78, which further 
includes the conversion unit 120, facilitates the issu- 
ance of basic instructions to the instruction execution 
units. 

[0092] Additionally, the circuit size of the parallel 
processor 26 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 
unit 52, as in the foregoing examples. 

(Example 6) 

[0093] FIG . 20 shows the structure of a sixth exam- 
ple of the parallel processor in accordance with the sec- 
ond embodiment of the present invention. As shown in 
FIG. 20, the parallel processor 27 has the same struc- 
ture as the parallel processor 24 of Example 3 shown in 
FIG. 14, except that the instruction issue unit 79 further 
includes a judgment unit 1 06. 

[0094] FIG. 21 shows the structures of the instruc- 
tion fetch unit 53 and the instruction issue unit 79 of the 
parallel processor 27 shown in FIG. 20. The instruction 
fetch unit 53 and the instruction issue unit 79 have the 
same structures as the instruction fetch unit 50 and the 
instruction issue unit 76 shown in FIG. 15, except that 
the instruction issue unit 79 further includes the judg- 
ment unit 106 connected between an instruction regis- 
ter 353 and a control unit 376. Based on a supplied 
basic instruction, the judgment unit 106 determines 
whether or not a basic instruction to be issued has the 
data dependency or control dependency, or causes 
resource sharing. The judgment results area reported to 
the control unit 376. If the basic instruction to be issued 
has the data dependency or control dependency, or 
causes resource sharing, the instruction issue unit 79 
issues the basic instruction only after the execution 
complete signals LUc and IUcO are supplied. 
[0095] For simplification of the drawing, only the 
instruction passages from the instruction register 353 to 
the two instruction execution units LUO and IU0 are 
shown, and the instruction passages to the other 
Instruction execution units IU1, FUO, FU1, and BUO are 
omitted in FIG. 21. Likewise, only the two execution 
complete signals LUc and IUcO are shown in FIG. 21 . 
[0096] The structures and operations of a first con- 
version unit 121 and a second conversion unit 122 are 
the same as the structures and operations of the first 
conversion unit 117 and the second conversion unit 
118. The structure and operation of the judgment unit 
1 06 are the same as the structure and operation of the 
judgment unit 103 shown in FIG. 6. 
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[0097] The parallel processor 27 of this example 
having the above structure can achieve both effects of 
the parallel processor of Example 2 of the first embodi- 
ment and the parallel processor of Example 3 of the 
second embodiment. More specifically, the instruction 5 
issue unit 79 including the judojnent unit 108 enables 
accurate and efficient parallel processing of basic 
instructions, thereby increasing the reliability of the 
operation. Also, the instruction fetch unit 53 including 
the first conversion unit 121 and the instruction issue 10 
unit 79 including the second conversion unit 122 facili- 
tate the issuance of basic instructions from the instruc- 
tion issue unit 79 to the instruction execution units. 
[0098] Additionally, the circuit size of the parallel 
processor 27 may be reduced by restricting in advance is 
the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 
unit 53, as in the foregoing examples. 

[Third Embodiment] 20 

[0099] As shown in FIGS. 22 to 27, parallel proces- 
sors 28 to 33 in accordance with a third embodiment of 
the present invention each comprises an instruction 
fetch unit 54-59 connected to the memory 12, an 25 
instruction issue unit 80-85 connected to the instruction 
fetch unit 54-59, instruction execution units LUO, IU0, 
IU1 , FUO, FU1 , MUO, MU1 , and BUO, and a register unit 
100 connected to all the instruction execution units. 
Here, the instruction execution units MUO and MU1 are x 
special-purpose arithmetic instruction execution units 
that execute special-purpose arithmetic instructions. 
When the execution of special-purpose arithmetic 
instructions is completed, the instruction execution units 
MUO and MU1 notify the instruction issue unit 80-85 of 35 
the complete of the execution. 
[0100] In the following, the parallel processors in 
accordance with the third embodiment of the present 
invention will be described by way of a case where the 
maximum basic instruction word length contained in 40 
one instruction word is 2. It should be understood that 
the same effects can be obtained in a case where the 
maximum instruction word length contained in one 
instruction word is 3 more greater. 

45 

(Example 1) 

[0101] FIG. 22 shows the structure of a first exam- 
ple of the parallel processor in accordance with the third 
embodiment of the present invention. As shown in FIG. so 
22, the parallel processor 28 comprises a conversion 
unit 123 in the instruction fetch unit 54. The structure 
and the operation of the conversion unit 123 are the 
same as the conversion unit 115 of Example 1 of the 
second embodiment More specifically, the conversion ss 
unit 1 23 rearranges bask: instructions contained in each 
instruction word in accordance with the arrangement of 
the instruction execution units, and then supplies the 



rearranged basic instructions to the instruction issue 
unit 80. 

[0102] The parallel processor 28 having the above 
structure can achieve the same effects as the parallel 
processor 22 of Example 1 of the second embodiment. 
In other words, the issuance of basic instructions from 
the instruction issue unit 80 to the instruction execution 
units can be facilitated, and the operation speed can be 
increased. 

[0103] Additionally, the circuit size of the parallel 
processor 28 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction execu- 
tion units, as in the foregoing examples. 

(Example 2) 

[0104] FIG. 23 shows the structure of a second 
example of the parallel processor in accordance with 
the third embodiment of the present invention. As 
shown in FIG. 23, the parallel processor 29 has the 
same structure as the parallel processor 23 shown in 
FIG. 12, comprising a conversion unit 124 in the instruc- 
tion issue unit 81. The structure and operation of the 
conversion unit 124 are the same as the structure and 
operation of the conversion unit 115 shown in FIGS. 10 
and 11. 

[0105] In the parallel processor 29 of this example, 
the instruction issue unit 81 issues each basic instruc- 
tion to the corresponding one of the instruction execu- 
tion units, only after the conversion unit 124 rearranges 
the basic instructions, which are contained in each 
instruction word supplied from the instruction fetch unit 
55, in accordance with the arrangement of the instruc- 
tion execution units. Thus, wires can be shortened as a 
whole, and the operation speed can be increased. 
[0106] Additionally, the circuit size of the parallel 
processor 29 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 
unit 55, as in the foregoing examples. 

(Example 3) 

[0107] FIG. 24 shows the structure of a third exam- 
ple of the parallel processor in accordance with the third 
embodiment of the present invention. As shown in FIG. 
24, the parallel processor 30 has substantially the same 
structure as the parallel processor 24 shown in FIG. 14. 
The instruction fetch unit 56 includes a first conversion 
unit 125 that rearranges basic instructions contained in 
each fetched instruction word in accordance with the 
arrangement of the instruction execution units. The 
instruction issue unit 82 includes a second conversion 
unit 126 that further rearranges basic instructions con- 
tained in each instruction word supplied from the 
instruction fetch unit 56 in accordance with the arrange- 
ment of the instruction execution units. 



12 



23 



EP 1 089 168 A2 



24 



[0108] The first conversion unit 125 performs "pre- 
processing - of rearrangement of basic instructions, and 
the second conversion unit 126 performs "postprocess- 
ing" of basic instructions. 

[0109] In an actual circuit, the processes in the s 
instruction fetch unit 56 and the instruction issue unit 82 
are pipelined in order to improve the performance of the 
parallel processor. Because of that, the difference in 
processing time between instruction fetch unit 56 and 
the instruction issue unit 82 should be as small as pos- 
sible to optimize the pipeline effects. Therefore, the 
arrangement process is divided into the "preprocessing" 
and "postprocessing", so that the difference in process- 
ing time between the instruction fetch unit 56 and the 
instruction issue unit 82 is small. 
[0110] By the parallel processor of this example 
having the above structure, wires can be shortened as a 
whole, and the operation speed can be increased. 
[0111] Additionally, the circuit size of the parallel 
processor 30 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 
unit 56, as in the foregoing examples. 

(Example 4) 

[0112] FIG. 25 shows the structure of a fourth 
embodiment of the parallel processor in accordance 
with the third embodiment of the present invention. As 
shown in FIG. 25, the parallel processor 31 has the 
same structure as the parallel processor 25 shown in 
FIG. 1 6. The instruction fetch unit 57 includes a conver- 
sion unit 127, and the instruction issue unit 83 includes 
a judgment unit 107. 
[0113] The structure and operation of the conver- 35 
sion unit 127 are the same as the structure and opera- 
tion of the conversion unit 115 shown in FIGS. 10 and 
11. The structure and operation of the judgment unit 
107 are the same as the structure and operation of the 
judgment unit 1 03 shown in FIG. 6. 40 
[0114] By the parallel processor of this example 
having the above structure, the same effects as the par- 
allel processor of Example 4 of the second embodiment 
can be obtained. More specifically, the instruction issue 
unit 83 including the judgment unit 107 enables accu- 45 
rate and efficient parallel processing of basic instruc- 
tions, thereby increasing the reliability of the operation. 
The instruction fetch unit 57 including the conversion 
unit 127 facilitates the issuance of basic instructions to 
the instruction execution units, thereby increasing the so 
operation speed. 

[0115] Additionally, the circuit size of the parallel 
processor 31 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 55 
unit 57, as in the foregoing examples. 



(Example 5) 

[0116] FIG. 26 shows the structure of a fifth exam- 
ple of the parallel processor in accordance with the third 
embodiment of the present invention. As shown in FIG. 

26, the parallel processor 32 has the same structure as 
the parallel processor 26 of Example 5 of the second 
embodiment shown in FIG. 18. The instruction issue 
unit 84 includes a conversion unit 128 and a judgment 
unit 108. 

[0117] The structure and operation of the conver- 
sion unit 128 are the same as the structure and opera- 
tion of the conversion unit 115 shown in FIGS. 10 and 
11. The structure and operation of the Judgement unit 
1 08 are the same as the structure and operation of the 
judgment unit 103. 

[0118] By the parallel processor of this example 
having the above structure, the same effects as the par- 
allel processor 26 of Example 5 of the second embodi- 
ment More specifically, the instruction issue unit 84 
including the judgment unit 108 enables accurate and 
efficient parallel processing of basic instructions, 
thereby increasing the reliability of the operation. The 
instruction issue unit 84 further including the conversion 
unit 128 facilitates the issuance of basic instructions to 
the instruction execution units, thereby increasing the 
operation speed. 

[0119] Additionally, the circuit size of the parallel 
processor 32 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 
unit 58, as in the foregoing examples. 

(Example 6) 

[0120] FIG. 27 shows the structure of a sixth exam- 
ple of the parallel processor in accordance wfth the third 
embodiment of the present invention. As shown in FIG. 

27, the parallel processor 33 has the same structure as 
the parallel processor 27 as shown in FIG. 20. 
[0121] The structures and operations of a first con- 
version unit 129 and a second conversion unit 130 are 
the same as the structures and operations of the first 
conversion unit 1 1 7 and the second conversion unit 1 1 8 
shown in FIG. 14. The structure and operation of a judg- 
ment unit 109 are the same as the structure and opera- 
tion of the judgment unit 103. 

[0122] By the parallel processor of this example 
having the above structure, the same effects as 
obtained by the parallel processor 27 of Example 6 of 
the second embodiment can be obtained. More specifi- 
cally, the instruction issue unit 85 including the judg- 
ment unit 109 enables accurate and efficient parallel 
processing of basic instructions, thereby increasing the 
reliability of the operation. The instruction fetch unit 59 
including the first conversion unit 129 and the instruc- 
tion issue unit 85 including the second conversion unit 
1 30 facilitate the issuance of basic instructions from the 
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instruction issue unit 85 to the instruction execution 
units, thereby increasing the operation speed. 
[0123] Additionally, the circuit size of the parallel 
processor 33 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 
unit 59, as in the foregoing examples. 

[Fourth Embodiment] 

[0124] As shown in FIGS. 28 to 33, a parallel proc- 
essor 34-39 in accordance with a fourth embodiment of 
the present invention each comprises an instruction 
fetch unit 60-65 connected to the memory 12, an 
instruction issue unit 86-91 connected to the instruction 
fetch unit 60-65, instruction execution units LUO, LU1 , 
IU0, IU1, FUO, FU1, BUO, and BU1 connected to the 
instruction issue unit 86-91 , and a register unit 101 con- 
nected to all the instruction execution units. In this 
embodiment, the instruction execution unit LU1 is a load 
store instruction execution unit that executes load 
instructions and store instructions. The instruction exe- 
cution unit BU1 is a branch instruction execution unit 
that executes branch instructions. When the execution 
is completed, the instruction execution unit BU1 notifies 
the instruction issue unit 86-91 of the end of the execu- 
tion. 

[0125] In the following, the parallel processor in 
accordance with the fourth embodiment of the present 
invention will be described by way of examples in which 
the maximum basic instruction word length contained in 
each one basic instruction is 4. In FIGS. 28 to 33, the 
maximum basic instruction word length being 4 is indi- 
cated by four arrows from the instruction fetch unit 60- 
65 to the instruction issue unit 86-91 . However, it should 
be understood that the maximum basic instruction word 
length in the fourth embodiment is not limited to 4. 

(Example 1) 

[0126] FIG. 28 shows the structure of a first exam- 
ple of the parallel processor in accordance with the 
fourth embodiment of the present invention. As shown 
in FIG. 28, the parallel processor 34 comprises a con- 
version unit 131 in the instruction fetch unit 60. The 
structure and operation of the conversion unit 131 are 
the same as the structure and operation of the conver- 
sion unit 115 of Example 1 of the second embodiment. 
More specifically, the conversion unit 131 rearranges 
basic instructions contained in each fetched instruction 
word, in accordance with the arrangement of the 
instruction execution units, and supplies the rearranged 
basic instructions to the instruction issue unit 86. 
[0127] By the parallel processor 34 having the 
above structure, the same effects as obtained by the 
parallel processor 22 of Example 1 of the second 
embodiment can also be obtained. More specifically, 
the issuance of basic instructions from the instruction 



issue unit 86 to the instruction execution units can be 
facilitated, and the operation speed can be increased 
accordingly. 

[0128] Additionally, the circuit size of the parallel 
5 processor 34 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 
unit 60, as in the foregoing embodiments. 

w (Example 2) 

[0129] FIG. 29 shows the structure of a second 
example of the parallel processor in accordance with 
the fourth embodiment of the present invention. As 

75 shown in FIG. 29, the parallel processor 35 has the 
same structure as the parallel processor 23 shown in 
FIG. 12, in that the instruction issue unit 87 includes a 
conversion unit 132. The structure and operation of the 
conversion unit 132 are the same as the structure and 

20 operation of the conversion unit 1 15 shown in FIGS. 1 0 
and 11. 

[0130] In the parallel processor 35 of this example, 
the instruction issue unit 87 rearranges basic instruc- 
tions contained in each instruction word supplied to the 

25 instruction fetch unit 61 , in accordance with the arrange- 
ment of the instruction execution unit, and then supplies 
the rearranged basic instructions to the instruction exe- 
cution units. Thus, wires can be shortened as a whole, 
and the operation speed can be increased. 

so [0131] Additionally, the circuit size of the parallel 
processor 35 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 
unit 61 , as in the foregoing examples. 

35 

(Example 3) 

[0132] FIG. 30 shows the structure of a third exam- 
ple of the parallel processor in accordance with the 

40 fourth embodiment of the present invention. As shown 
in FIG. 30, the parallel processor 36 has the same struc- 
ture as the parallel processor 24 shown in FIG. 14. The 
instruction fetch unit 62 of this parallel processor 36 
includes a first conversion unit 133 that rearranges 

45 basic instructions contained in each fetched instruction 
word, in accordance with the arrangement of the 
instruction execution units. The instruction issue unit 88 
of the parallel processor 36 includes a second conver- 
sion unit 134 that further rearranges the basic instruc- 

50 ttons contained in each instruction word supplied from 
the instruction fetch unit 62, in accordance with the 
arrangement of the instruction execution units. 
[0133] The first conversion unit 133 performs "pre- 
processing" of the rearrangement of basic instructions, 

55 and the second conversion unit 134 performs "post- 
processing" of the rearrangement of the basic instruc- 
tions. 

[0134] To improve the performance of the parallel 
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processor in an actual circuit, the processes in the 
instruction fetch unit 62 and the instruction issue unit 88 
are pipelined. Because of that the difference in 
processing time between instruction fetch unit 62 and 
the instruction issue unit 88 should be as small as pos- 5 
stole to optimize the pipeline effects. Therefore, the 
arrangement process is divided into the "preprocessing" 
and "postprocessing", so that the difference in process- 
ing time between the instruction fetch unit 62 and the 
instruction issue unit 88 is small. 
[0135] By the parallel processor 36 of this example 
having the above structure, wires can be shortened as a 
whole, and the operation speed can be increased. 
[0136] Additionally, the circuit size of the parallel 
processor 36 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 
unit 62, as in the foregoing examples. 

(Example 4) 

[0137] FIG. 31 shows the structure of a fourth 
example of the parallel processor in accordance with 
the fourth embodiment of the present invention. As 
shown in FIG. 31, the parallel processor 37 has the 
same structure as the parallel processor 25 shown in 
FIG. 16, in that the instruction fetch unit 63 includes a 
conversion unit 135 and the instruction issue unit 89 
includes a judgment unit 110. 
[0138] The structure and operation of the conver- 30 
sion unit 135 are the same as the structure and opera- 
tion of the conversion unit 115 shown in FIGS. 10 and 
11. On the other hand, the structure and operation of 
the judgment unit 110 are the same as the judgment 
unit 1 03 shown in FIG. 6. as 
[01 39] By the parallel processor 37 of this example 
having the above structure, the same effects as 
obtained by the parallel processor 25 of Example 4 of 
the second embodiment can be obtained. More specifi- 
cally, the instruction issue unit 89 including the judg- 40 
ment unit 110 enables accurate and efficient parallel 
processing of basic instructions, thereby increasing the 
reliability of the operation. The instruction fetch unit 63 
including the conversion unit 135 facilitates the issu- 
ance of basic instructions from the instruction issue unit 45 
89 to the instruction execution units, thereby increasing 
the operation speed. 

[0140] Additionally, the circuit size of the parallel 
processor 37 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each so 
instruction word to be supplied to the instruction fetch 
unit 63, as in the foregoing examples. 

(Example 5) 

55 

[0141] FIG. 32 shows the structure of a fifth exam- 
ple of the parallel processor in accordance with the 
fourth embodiment of the present invention. As shown 



in FIG. 32, the parallel processor 38 has the same struc- 
ture as the parallel processor 26 of Example 5 of the 
second embodiment shown in FIG. 18, in that the 
instruction issue unit 90 includes a conversion unit 136 
and a judgment unit 1 1 1 . 

[0142] The structure and operation of the conver- 
sion unit 136 are the same as the structure and opera- 
tion of the conversion unit 1 15 shown in FIGS. 10 and 
11. On the other hand, the structure and operation of 
the judgment unit 1 1 1 are the same as the structure and 
operation of the judgment unit 1 03 shown in FIG. 6. 
[0143] By the parallel processor of this example 
having the above structure, the same effects as 
obtained by the parallel processor 26 of Example 5 of 
the second embodiment can be obtained. More specifi- 
cally, the instruction issue unit 90 including the judg- 
ment unit 111 enables accurate and efficient parallel 
processing of basic instructions, thereby increasing the 
reliability of the operation. The instruction issue unit 90 
further including the conversion unit 136 facilitates the 
issuance of basic instruction to the instruction execution 
units, thereby increasing the operation speed. 
[0144] Additionally, the circuit size of the parallel 
processor 38 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 
unit 64, as in the foregoing examples. 

(Example 6) 

[0145] FIG. 33 shows the structure of a sixth exam- 
ple of the parallel processor in accordance with the 
fourth embodiment of the present invention. As shown 
in FIG. 33, the parallel processor 39 has the same struc- 
ture as the parallel processor 27 shown in FIG. 20. 
[0146] The structures and operations of a first con- 
version unit 137 and a second conversion unit 138 are 
the same as the structures and operations of the first 
conversion unit 1 1 7 and the second conversion unit 1 1 8 
shown in FIG. 14. On the other hand, the structure and 
operation of the judgment unit 1 12 are the same as the 
structure and operation of the judgment unit 103 shown 
in FIG. 6. 

[0147] By the parallel processor 39 of this example 
having the above structure, the same effects as 
obtained by the parallel processor of Example 6 of the 
second embodiment can be obtained. More specifically, 
the instruction issue unit 91 including the judgment unit 
1 12 enables accurate and efficient parallel processing 
of basic instructions, thereby increasing the reliability of 
the operation. The instruction fetch unit 65 including the 
first conversion unit 1 37 and the instruction issue unit 91 
further including the second conversion unit 138 facili- 
tate the issuance of basic instructions from the instruc- 
tion issue unit 91 to the instruction execution units, 
thereby increasing the operation speed. 
[0148] Additionally, the circuit size of the parallel 
processor 39 may be reduced by restricting in advance 
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the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 
unit 65, as in the foregoing examples. 

[Fifth Embodiment] 5 

[0149] As shown in FIGS. 34 to 39, parallel proces- 
sors 40 to 45 in accordance with a fifth embodiment of 
the present invention each comprise an instruction fetch 
unit 66-71 connected to the memory 12, an instruction 10 
issue unit 92-97 connected to the instruction fetch unit 
66-71, instruction execution units LUO, LU1, IU0, IU1, 
FUO, FU1, MUO, MU1, BUO, and BU1, and a register 
unit 102 connected to all the instruction execution units. 
[0150] In the following, the parallel processor in is 
accordance with the fifth embodiment of the present 
invention will be described by way of examples in which 
the maximum basic instruction word length contained in 
each instruction word is 4. In FIGS. 34 to 39, the maxi- 
mum basic instruction word length being 4 is indicated 20 
by four arrows extending from the instruction issue unit 
66-71 to the instruction issue unit 92-97. 
[0151] It should be understood that the maximum 
basic instruction word length is not limited to 4 in this 
embodiment. 25 



(Example 2) 

[0155] FIG. 35 shows the structure of a second 
example of the parallel processor in accordance with 
the fifth embodiment of the present invention. As shown 
in FIG. 35, the parallel processor 41 has the same struc- 
ture as the parallel processor 23 shown in FIG. 12, in 
that the instruction issue unit 93 includes a conversion 
unit 140. The structure and operation of the conversion 
unit 140 are the same as the structure and operation of 
the conversion unit 1 15 shown in FIGS. 10 and 11. 
[0156] In the parallel processor 41 of this example, 
the instruction issue unit 93 rearranges basic instruc- 
tions contained in each instruction word supplied from 
the instruction fetch unit 67, and then supplies the rear- 
ranged basic instructions to the instruction execution 
units. Thus, wires can be shortened as a whole, and the 
operation speed can be increased. 
[0157] Additionally, the circuit size of the parallel 
processor 41 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 
unit 67, as in the foregoing examples. 

(Example 3) 



75 



(Example 1) 

[0152] FIG. 34 shows the structure of a first exam- 
ple of the parallel processor in accordance with the fifth so 
embodiment of the present invention. As shown in FIG. 
34, the parallel processor 40 comprises a conversion 
unit 139 in the instruction fetch unit 66. The structure 
and operation of the conversion unit 139 are the same 
as the structure and operation of the conversion unit as 
115 of Example 1 of the second embodiment of the 
present invention. The conversion unit 139 rearranges 
basic instructions contained in each fetched instruction 
word, in accordance with the arrangement of the 
instruction execution units, and then supplies the rear- 40 
ranged basic instructions to the instruction issue unit 
92. 

[0153] By the parallel processor 40 having the 
above structure, the same effects as obtained by the 
parallel processor 22 of Example 1 of the second 4s 
embodiment can be obtained. More specifically, the 
issuance of basic instruction from the instruction issue 
unit 92 to the instruction execution units can be facili- 
tated, and the operation speed can be increased 
accordingly. 50 
[0154] Additionally, the circuit size of the parallel 
processor 40 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 
unit 66, as in the foregoing embodiments. 55 



[0158] FIG. 36 shows the structure of a third exam- 
ple of the parallel processor in accordance with the fifth 
embodiment of the present invention. As shown in FIG. 
36, the parallel processor 42 of this example has the 
same structure as the parallel processor 24 shown in 
FIG. 14. The instruction fetch unit 68 of the parallel 
processor 42 includes a first conversion unit 141 that 
rearranges basic instructions contained in each fetched 
instruction word in accordance with the arrangement of 
the instruction execution units. The instruction issue unit 
94 of the parallel processor 42 includes a second con- 
version unit 142 that further rearranges basic instruc- 
tions contained in each instruction word supplied from 
the instruction fetch unit 68 in accordance with the 
arrangement of the instruction execution units. 
[0159] The first conversion unit 141 performs "pre- 
processing" of the rearrangement of basic instructions, 
and the second conversion unit 142 performs "post- 
processing" of the rearrangement of the basic instruc- 
tions. 

[0160] In order to improve the performance of the 
parallel processor in an actual circuit, the processes in 
the instruction fetch unit 68 and the instruction issue unit 
94 are pipelined. Because of that, the difference in 
processing time between instruction fetch unit 68 and 
the instruction issue unit 94 should be as small as pos- 
sible to optimize the pipeline effects. Therefore, the 
arrangement process is divided into the "preprocessing" 
and "postprocessing", so that the difference in process- 
ing time between the instruction fetch unit 68 and the 
instruction issue unit 94 can be small. 
[01 61 ] By the parallel processor 42 of this example 
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having the above structure, wires can be shortened as a 
whole, and the operation speed can be increased. 
[0162] Additionally, the circuit size of the parallel 
processor 42 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each s 
instruction word to be supplied to the instruction fetch 
unit 68, as in the foregoing examples. 

(Example 4) 

10 

[0163] FIG. 37 shows the structure of a fourth 
example of the parallel processor in accordance with 
the fifth embodiment of the present invention. As shown 
in FIG. 37, the parallel processor 43 has the same struc- 
ture as the parallel processor 25 shown in FIG. 1 6, in is 
that the instruction fetch unit 69 includes a conversion 
unit 143 and the instruction issue unit 95 includes a 
judgment unit 113. 

[0164] The structure and operation of the conver- 
sion unit 143 are the same as the structure and opera- 20 
tion of the conversion unit 115 shown in FIGS. 10 and 
11. On the other hand, the structure and operation of 
the judgment unit 1 1 3 are the same as the structure and 
operation of the judgment unit 103 shown in FIG. 6. 
[0165] By the parallel processor 43 of this example 25 
having the above structure, the same effects as 
obtained by the parallel processor 25 of Example 4 of 
the second embodiment can be obtained. More specifi- 
cally, the instruction issue unit 95 including the judg- 
ment unit 113 enables accurate and efficient parallel 30 
processing of basic instructions, thereby increasing the 
reliability of the operation. The instruction fetch unit 69 
including the conversion unit 143 facilitates the issu- 
ance of base instructions from the instruction issue unit 
95 to the instruction execution units, thereby increasing 3s 
the operation speed. 

[0166] Additionally, the circuit size of the parallel 
processor 43 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 40 
unit 69, as in the foregoing examples. 

(Example 5) 

[0167] FIG. 38 shows the structure of a fifth exam- 45 
pie of the parallel processor in accordance with the fifth 
embodiment of the present invention. As shown in FIG. 
38, the parallel processor 44 of this example has the 
same structure as the parallel processor 26 of Example 
5 of the second embodiment shown in FIG. 18, in that so 
the instruction issue unit 96 includes a conversion unit 
144 and a judgment unit 114. 

[0168] The structure and operation of the conver- 
sion unit 144 are the same as the structure and opera- 
tion of the conversion unit 115 shown in FIGS. 10 and 55 
11. On the other hand, the structure and operation of 
the judgment unit 1 1 4 are the same as the structure and 
operation of the judgment unit 1 03 shown in FIG. 6. 
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[01 69] By the parallel processor 44 of this example 
having the above structure, the same effects as 
obtained by the parallel processor 26 of Example 5 of 
the second embodiment can be obtained. More specifi- 
cally, the instruction issue unit 96 including the judg- 
ment unit 114 enables accurate and efficient parallel 
processing of basic instructions, thereby increasing the 
reliability of the operation. The instruction issue unit 96 
further including the conversion unit 144 facilitates the 
issuance of basic instructions to the instruction execu- 
tion units, thereby increasing the operation speed. 
[0170] Additionally, the circuit size of the parallel 
processor 44 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 
unit 70, as in the foregoing examples. 

(Example 6) 

[0171] FIG. 39 shows the structure of a sixth exam- 
ple of the parallel processor in accordance with the fifth 
embodiment of the present invention. As shown in FIG. 
39, the parallel processor 45 of this example has the 
same structure as the parallel processor 27 shown in 
FIG. 20. The instruction fetch unit 71 includes a first 
conversion unit 145, and the instruction issue unit 97 
includes a second conversion unit 146 and a judgment 
unit 21 9. 

[0172] The structures and operations of the first 
conversion unit 145 and the second conversion unit 146 
are the same as the structures and operations of the 
first conversion unit 1 17 and the second conversion unit 
1 18 shown in FIG. 14. On the other hand, the structure 
and operation of the judgment unit 21 9 are the same as 
the structure and operation of the judgment unit 103 
shown in FIG. 6. 

[0173] By the parallel processor 45 of this example 
having the above structure, the same effects as 
obtained by the parallel processor 27 of Example 6 of 
the second embodiment can be obtained. More specifi- 
cally, the instruction issue unit 97 including the judg- 
ment unit 219 enables accurate and efficient parallel 
processing of basic instructions, thereby increasing the 
reliability of the operation. The instruction fetch unit 71 
including the first conversion unit 145 and the instruc- 
tion issue unit 97 including the second conversion unit 
1 46 facilitate the issuance of basic instructions from the 
instruction issue unit 97 to the instruction execution 
units, thereby increasing the operation speed. 
[0174] Additionally, the circuit size of the parallel 
processor 45 may be reduced by restricting in advance 
the arrangement of basic instructions contained in each 
instruction word to be supplied to the instruction fetch 
unit 71 , as in the foregoing examples. 
[0175] The present invention is not limited to the 
specifically disclosed embodiments, but variations and 
modifications may be made without departing from the 
scope of the present invention. 
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[0176] The present application is based on Japa- 
nese priority application No. 11-281957, filed on Octo- 
ber 1, 1999, the entire contents of which are hereby 
incorporated by reference. 

5 

Claims 

1 . A parallel processor that performs parallel process- 
ing of one or more basic instructions contained in 
each of instruction words delimited by instruction 10 
delimiting information, said parallel processor char- 
acterized by comprising: 

a plurality of instruction execution units that 
perform processes corresponding to supplied is 
basic instructions in parallel; 
an instruction fetch unit that fetches the instruc- 
tion words one by one in accordance with the 
instruction delimiting information; and 
an instruction issue unit that selectively issues 20 
each of the basic instructions supplied from the 
instruction fetch unit to one of the instruction 
execution units to execute an issued basic 
instruction. 

25 

2. The parallel processor as claimed in claim 1, char- 
acterized in that the plurality of instruction execution 
units all have the same structure. 

3. The parallel processor as claimed in claim 1 , char- 30 
acterized in that: 

at least two of the instruction execution units 
have different structures from each other; and 
the instruction fetch unit rearranges the basic 35 
instructions contained in each of the fetched 
instruction words, in accordance with arrange- 
ment of the plurality of instruction execution 
units, and then supplies the rearranged basic 
instructions to the instruction issue unit 40 

4. The parallel processor as claimed in claim 1 , char- 
acterized In that: 

at least two of the instruction execution units 45 
have different structures from each other; and 
the instruction issue unit rearranges the basic 
instructions contained in each of the instruction 
words supplied from the instruction fetch unit, 
in accordance with arrangement of the plurality so 
of instruction execution units, and then supplies 
the rearranged basic instructions to the instruc- 
tion execution units. 

5. The parallel processor as claimed in claim 1 , char- 55 
acterized in that: 

at least two of the instruction execution units 



have different structures from each other; 
the instruction fetch unit rearranges the basic 
instructions contained in each of the fetched 
instruction words, in accordance with arrange- 
ment of the instruction execution units, and 
then supplies the rearranged basic instructions 
to the instruction issue unit; and 
the instruction issue unit further rearranges the 
basic instructions contained in each of the 
instruction word supplied from the instruction 
fetch unit, in accordance with the arrangement 
of the instruction execution units, and then sup- 
plies the rearranged basic instructions to the 
instruction execution units. 

6. The parallel processor as claimed in one of claims 
3 to 5, characterized in that 

at least two of the instruction execution units 
have different structures from each other; and 
the instruction fetch unit fetches an instruction 
word that contains basic instruction arranged in 
advance in accordance with the arrangement 
of the instruction execution units. 

7. The parallel processor as claimed in one of claims 
1 to 6, characterized in that, depending on the type 
of a basic instruction being currently executed by 
one of the instruction execution units, the instruc- 
tion issue unit issues a next basic instruction before 
the execution of a basic instruction being currently 
executed is completed. 

8. The parallel processor as claimed in claim 7, char- 
acterized in that, if a supplied basic instruction does 
not have data dependency or control dependency, 
or does not share resources with a basic instruction 
being currently executed by one of the instruction 
execution units, the instruction issue unit issues the 
supplied basic instruction before the execution of 
the basic instruction being currently executed is 
completed. 
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